Disambiguating Verbs by Collocation: Corpus Lexicography meets Natural Language Processing
نویسندگان
چکیده
This paper reports the results of Natural Language Processing (NLP) experiments in semantic parsing, based on a new semantic resource, the Pattern Dictionary of English Verbs (PDEV) (Hanks, 2013). This work is set in the DVC (Disambiguating Verbs by Collocation) project aimed at expanding PDEV to a large scale. This project springs from a long-term collaboration of lexicographers with computer scientists which has given rise to the design and maintenance of specific, adapted, and user-friendly editing and exploration tools. Particular attention is drawn on the use of NLP deep semantic methods to help in data processing. Possible contributions for NLP include pattern disambiguation, the focus of this article. The present article explains how PDEV differs from other lexical resources and describes its structure in detail. It also presents new classification experiments on a subset of 25 verbs. The SVM model obtained a micro-average F1 score of 0.81.
منابع مشابه
A Synchronous Corpus-Based Study of Verb-Noun Fluidity in Chinese
The problem of verb-noun categorial ambiguity is critical and relatively unique for non-inflectional languages, especially Chinese. We consider the verb-noun categorial fluidity a continuum and any categorial shift a transitional process. A synchronous corpus-based study was conducted to compare the phenomenon with respect to news texts collected from Hong Kong, Beijing, and Taiwan. It was foun...
متن کاملThe Application of Fuzzy Logic to Collocation Extraction
Collocations are important for many tasks of Natural language processing such as information retrieval, machine translation, computational lexicography etc. So far many statistical methods have been used for collocation extraction. Almost all the methods form a classical crisp set of collocation. We propose a fuzzy logic approach of collocation extraction to form a fuzzy set of collocations in ...
متن کاملInternational Workshop Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning
TerminoWeb is a web-based platform designed to find and explore specialized domain knowledge on the Web. An important aspect of this exploration is the discovery of domain-specific collocations on the Web and their presentation in a concordancer to provide contextual information. Such information is valuable to a translator or a language learner presented with a source text containing a specifi...
متن کاملSemEval-2015 Task 15: A CPA dictionary-entry-building task
This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs ...
متن کاملOn Delexicalization Features of Light Verbs in Mandarin
Delexicalization is the tendency of meaning, reflecting the structural feature of light verbs. It emerges from collocation usages of semantic sharing. Based on the corpus and exemplified with “ ” and “ ”, the current paper addresses the delexicalizing features of Chinese light verbs, while proposing a lexicographic approach concerning the conventionality of light verb construction and endeavori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014